-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Armv7-M: Allow register overlap in ldm + ldrd #153
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Can you please add a new example to test that this works.
A simple
ldrd r0, r1, [r0]
ldm r0, {r0-r3}
should do.
Or simply extend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your changes. Almost there.
slothy/targets/arm_v7m/arch_v7m.py
Outdated
@@ -1486,7 +1486,7 @@ def make(cls, src): | |||
obj.increment = None | |||
obj.pre_index = 0 | |||
obj.addr = obj.args_in[0] | |||
obj.args_in_out_different = [(0,0)] # Can't have Rd==Ra | |||
#obj.args_in_out_different = [(0,0)] # Can't have Rd==Ra |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove those, not comment them out.
Also we need to test if this affects any other examples in SLOTHY.
For that please make sure you have a clean copy of SLOTHY, and then run
python3 example.py --timeout 60 --only-target=slothy.targets.arm_v7m.cortex_m7
This is going to run for a few hours. Then zip up the output files in examples/opt/armv7m
and attach them to this PR.
0e257f4
to
8871294
Compare
Previously ldm and ldrd fusion would break if the same register is used as address as one of the outputs (and it's not the last output). This commit fixes that by changing the fusion to re-order the ldr overwriting the address to the very end in case there is an overlap. Note that this is not needed for stm/strd as there you cannot have an overlap. Additionally, it removes unnecessary restrictions disallowing Rd=Ra for ldrb/ldrh/ldr.
I cleaned this up, but I still need to run a full test with something like
(shorter timeout does not work for the dilithium ntt). |
I re-ran with larger timeout (300 is still not enough for the dilithiume ntt, but 600 seems fine on my machine). Unfortunately, fnt_257_dilithium_m7 fails the selftest:
This need investigation before we can merge this. @dop-amin any ideas? |
Fixed the splitting of
ldrd
andldm
when the address register and output register overlap inldrd_imm_splitting_cb
andldm_interval_splitting_cb
.